Enhanced Infrastructure for Creation and Collection of Translation Resources

نویسندگان

  • Zhiyi Song
  • Stephanie Strassel
  • Gary Krug
  • Kazuaki Maeda
چکیده

Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged approach to parallel text corpus development (acquisition of existing parallel text from known repositories, harvesting and aligning of potential parallel documents from the web, and manual creation of parallel text by professional translators), and describe recent adaptations that have enabled significant expansions in the scope, variety, quality, efficiency and cost-effectiveness of translation resource creation at LDC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Drawing Inspiration from the Quran to Open the M-game-Enhanced Avenue for Translation of Quranic Chapters

Game-based practicing of materials can be seen as a method of capturing an essence of real- life expe-rience which is commonly missing in the conventional face-to-face classrooms. To serve the L2 learn-ers'' immediate communicative needs in wider classroom and societal contexts, this study sought to place L2 English learners within an interactional social framework through reinforcing their Eng...

متن کامل

Baltic and Nordic Parts of the European Linguistic Infrastructure

This paper describes scientific, technical, and legal work done on the creation of the linguistic infrastructure for the Nordic and Baltic countries. The paper describes the research on assessment of language technology support for the languages of the Baltic and Nordic countries, work on establishing a language resource sharing infrastructure, and collection and description of linguistic resou...

متن کامل

Designing the infrastructure model required for the implementation of knowledge management system in the international marketing sector of the oil industry

The purpose of this study is to design the infrastructure model required for the implementation of knowledge management system in the international marketing of the oil industry. Since in applied research, the main goal is not only scientific discovery, but also to test and study the application of knowledge, so the method of this research is applied in terms of purpose and has been done by exp...

متن کامل

Urbanisation and the State of Infrastructure in the Developing World Cities

The dominant policy decision emphasis on urbanisation problem in developing countries is itsrate of growth, ignoring the level of provision of resources, including the infrastructure, to match this growth. It isagainst this background that the paper undertook a broad analysis of the state of infrastructure in developing countriesusing such indices as access and quality of water supply, sanitati...

متن کامل

Automatic Translation of Scientific Documents in the HAL Archive

This paper describes the development of a statistical machine translation system between French and English for scientific papers. This system will be closely integrated into the French HAL open archive, a collection of more than 100.000 scientific papers. We describe the creation of in-domain parallel and monolingual corpora, the development of a domain specific translation system with the cre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010